======= >>>>>>> f59f7968b386d5d9dcf0e10c23f0d92469e64eb5
f59f7968b386d5d9dcf0e10c23f0d92469e64eb5
Introduce the lesson and ask students, what are data?
For reference:
Wikipedia - Data are characteristics or information, usually numerical, that are collected through observation.[1] In a more technical sense, data is a set of values of qualitative or quantitative variables about one or more persons or objects, while a datum (singular of data) is a single value of a single variable. [2]
MerriamWebster - Factual information (such as measurements or statistics) used as a basis for reasoning, discussion, or calculation Tech terms - Computer data is information processed or stored by a computer. This information may be in the form of text documents, images, audio clips, software programs, or other types of data.
Dictionary.com - A plural of datum. Datum: A single piece of information, as a fact, statistic, or code; an item of data. In Philosophy 1) any fact assumed to be a matter of direct observation. 2) Any proposition assumed or given, from which conclusions may be drawn.
Ask students to provide examples of data from their daily life or from their own course work.
The world today is filled with science news, from climate change to hurricanes – and it all affects the people inhabiting the earth. The label “data” is all-encompassing and can be categorized in several ways. These graphics help us classify types of data.
Ask students to think of some data related to any of these categories. Then to try and define a data repository.
Some leading questions:
The amount of available storage is not keeping up with the amount of data flooding in daily. How do we decide what data we keep?
Ask students to discuss where data from research seem to end up. Since we already introduced repositories in the earlier slide, this question is not necessarily meant to reinforce the concept. Rather, what we want the students to get at is where do they think data from research typically goes and how does that fit into the scientific process.
Because researchers require time to verify data, analyze their data, and derive research conclusions, individual researchers generally are not expected to make all their data public immediately.
New tools for automatically assessing the quality of data and sharing them with others can facilitate the rapid sharing of digital data, although verifying the reliability of these tools presents its own set of challenges.
Once a research result is published, the norms of science—and often the terms of the research grant or contract—call for the supporting data to be accessible.
Researchers may nevertheless try to keep the data private, perhaps to derive additional results without competition from others, for the exclusive use of a student or postdoctoral fellow whose career would be advanced by generating further papers, or just to avoid the effort to put the data in a usable form for others. In the worst cases, they may retain data to hide acts of research misconduct or to conceal defects in the dataset.
Sometimes researchers may want to keep the data private just out of respect for the type of data collected. Some data contains sensitive information such as people’s names, racial or ethnic origin, political opinions, religious or philosophical beliefs.
The norms of a research community may allow keeping data private for a certain period. These norms can be formalized through the terms of a grant-giving the investigator a defined period of exclusive use of the data, with the exclusivity ending upon the publication of results, after a particular length of time, or when data are deposited in a data center or archive.
There is great variation among research fields in their data-sharing norms, to such an extent that different fields can be said to have different data cultures.
Given the numerous ways data can be lost and thereby irrevocably damage the integrity and reproducibility of the associated research, this is exactly why researchers should have a plan in mind for their data.
Let’s circle back to repositories now that we’ve explained the potential “why” for their use. Repositories have become a significant component of the infrastructure for research data. A repository is both a system and set of services designed as an archive for digital data with context, fixity, and persistence. In addition to data, repositories maintain and preserve metadata.
Ask students to find a data repository online in groups then share their findings with the class.
Good starting point here or here.
Key objectives from the activity:
While we might have come up with what a repository is as a class, now take a moment to explicitly define the uses of data repository.
Data repositories provide:
Now that we have established why using a repository is important, we want to then tie this into the philosophy of open science.
Open science is the movement to make scientific research and data accessible to all. In many ways, the reasons why we should store scientific data in a repository are the same reasons why we should embrace the culture of open science.
Optional Additional Materials
As we discussed in lesson one, well-managed data can result in re-use, integration, and new science. But data is not insular. In order to create quality data, key components of good data (well-organized, documented, preserved, accessible, verifiable) are inherently associated with well-kept records of this data. In the data life cycle, describing data is a fundamental part of data management.
Metadata are documentation describing the content, context, and structure of data to enable future interpretation and reuse of the data. Generally, metadata describe who collected the data, what data were collected, when and where they were collected, and why they were collected.
Metadata serves data discovery at multiple levels:
Ask students to find a dataset in the Arctic Data Center repository. They can search by location or by keyword – or you can use this dataset about the phenological mismatch in the Arctic.
Ask students to find and record key metadata such as the title, abstract, dataset creator, author, and location.
Finally, in groups, ask students to dissect a paper such as this one on snow melt or this one on how pollution affects Arctic cloud development. Have them highlight or underline the parts of the paper that would be entered into a repository as metadata. Next, ask the students to compare that with how the data was actually stored at the Arctic Data Center by viewing the citation at the bottom of the paper. Discuss differences.
Optional Additional Materials
Data citation is the practice of referencing data products used in research. A data citation includes key descriptive information about the data, such as the title, source, and responsible parties. As we touched upon the previous lesson on metadata, data authors serve an important role in storing data. They also create data citations. These citations are used when data is referenced in a scientific study, education, or other activities. While it may seem like data citation is part of the “preserve” phase, it is actually part of the describe phase.Even though it is standard practice to cite scientific articles, it is not yet common practice to cite data. Both long and short term goals are helping change the narrative around research by encouraging adoption of practices that are in line with open science.
Much like metadata, there are numerous, optional components to a potential citation. Further, also like the metadata record, the more information you can provide, the more replicable and complete you can be. Accessing the data citation is done through a persistent identifier. This is where you can find the metadata.
Ask students to download the dataset files from the dataset they were using in the metadata lesson. Have the students open up the files and familiarize themselves with the file types associated with this dataset. Using the dataset they downloaded, assign this assignment for the students to complete in their own time.
Below are the answers to the take home assignment.
Importance: What professional norms and practices, both of individuals and of institutions or organizations, support or undermine the idea that data are legitimate and citable products of research? How?
Possible answers include promotion and tenure criteria at universities (data may or may not be valued as scholarly contributions), research funders policies (they may encourage demonstrating the availability of data sets created with prior funding, data in CVs, and require sharing data), publishers may encourage or require data sharing and citation, data centers may provide recommended citations for data sets.
Credit and attribution: Credit and attribution of more traditional types of research products is an established norm and practice; is extending this practice to include data a simple and natural thing to do? Why or why not?
On the surface, yes, but questions of what constitutes an authorship role may be an issue (see Duke and Porter paper), note also some of the possible responses to question 1.
Evidence: Citing literature to support claims is also an established practice; is extending this practice to include data a simple and natural thing to do? Why or why not?
Not all data are in a data center or repository and formally citable; authors may not be familiar with data citation practices.
Unique identification: Is it always possible for a data creator to obtain a persistent identifier for their data set? Why or why not?
The repository or data center may or may not supply a persistent identifier, data creator may or may not have access to an identifier service to create one themselves; identifiers must be maintained in order to continue to work.
Access: In practice, do data citations always provide direct access to the data set? Why or why not?
Sometimes - data centers only provide metadata and contact information and data must be obtained from the data creator or another party.
Optional Additional Materials:
Even though not always recognized as such, the data life cycle is an inherent part of the scientific process and only is beginning to be taught as such. Take a look at the steps in the scientific process… What does “experiment” even mean in today’s research arena?
A modern experiment can mean a new evaluations on data collected from other researchers or floating buoys, sensors, or even internet algorithms. Regardless of the experiment type, research is based upon data. Should there be another attempt to replicate said experiment, they would have to reacquire the data for themselves (time and money willing) or ask to use the original set. How do we preserve the integrity of research if we cannot replicate it? The data life cycle.
The data life cycle below has eight components. Each step helps the researcher organize, manage, and preserve their data to improve the chances of their data being used effectively by others.
Ask students to complete the ADC handout, “Describing the Data Life Cycle.” The Data One Skillbuiliding Hub is a good resource. Then, discuss and define the components of the data life cycle as a class.
Optional Additional Materials
Objectives: By the end of this lesson, students will be able to….
Learned skills:
Course Format:
This module is meant to be taught within the timeframe of one lab period (or approximately 2 hours in length). Included in this lesson is an introductory video and prepared PowerPoint with time allotted for exploration into the Arctic Data Center’s repository and completion of an in-class assignment. Instructors may want to assign students additional out of classwork, or use additional modules to round out a full unit.Suggested readings: There are no required readings for this course, but feel free to recommend this introductory paper, “Skills and Knowledge for Data-Intensive Environmental Research”.
The Arctic Data Center is a place for researchers from around the world working in the Arctic to efficiently share, discover, access, and interpret complex data about the Arctic with less effort. Part of our mission is to provide hands on training at Arctic research conferences and in dedicated training sessions targeting Arctic researchers, especially early-career and under-represented populations. In 2020, we were able to support the development of undergraduate-level educational materials through our fellowship program. These materials are open source and available for reuse and/or modification either at the associated GitHub repository, the Arctic Data Center Training page, or the DataOne Skillbuilding Hub.
With over 5,700 datasets in the Arctic Data Center repository, these education modules are intended to equip students with the necessary tools and resources to unearth the story behind the data, one about one of the planet’s fastest-changing ecosystems: the Arctic.
Encompassing Earth’s northern most region, the Arctic is an icy sea surrounded by land characterized by a harsh climate with extreme variation in light and temperature. Diverse landscapes—from the sea ice to coastal wetlands, upland tundra, mountains, wide rivers, and the sea itself—support abundant wildlife and many cultures making the Arctic a region like no other in the world.
The Arctic is warming twice as fast as the global average. The cause of such rapid warming is straightforward and well understood: It is human-caused climate change, and it is altering the relative amount of the Sun’s energy that is absorbed, reflected, or radiated in the Arctic.
According to 15th annual NOAA Arctic Report Card, the sea-ice extent in October of 2020 dropped to the lowest levels on record. Dramatic drops in Arctic ice like this are the main driver for rapid Arctic changes. Effects of a warming atmosphere on physical, chemical, biological, and human components of Arctic ecosystems are myriad, far-reaching, and accelerating. There is no facet of Arctic life that will remain untouched by the immensity of these changes.
Studying and understanding the Arctic is essential txo saving it, and ultimately ourselves. The Arctic Data Center repository is rich with real-world Arctic data just waiting to be explored. If you’re ready to learn how, let’s get started.